Predicting major clinical events among Canadian adults with laboratory-confirmed influenza infection using the influenza severity scale

We developed and validated the Influenza Severity Scale (ISS), a standardized risk assessment for influenza, to estimate and predict the probability of major clinical events in patients with laboratory-confirmed infection. Data from the Canadian Immunization Research Network’s Serious Outcomes Surveillance Network (2011/2012–2018/2019 influenza seasons) enabled the selecting of all laboratory-confirmed influenza patients. A machine learning-based approach then identified variables, generated weighted scores, and evaluated model performance. This study included 12,954 patients with laboratory-confirmed influenza infections. The optimal scale encompassed ten variables: demographic (age and sex), health history (smoking status, chronic pulmonary disease, diabetes mellitus, and influenza vaccination status), clinical presentation (cough, sputum production, and shortness of breath), and function (need for regular support for activities of daily living). As a continuous variable, the scale had an AU-ROC of 0.73 (95% CI, 0.71–0.74). Aggregated scores classified participants into three risk categories: low (ISS < 30; 79.9% sensitivity, 51% specificity), moderate (ISS ≥ 30 but < 50; 54.5% sensitivity, 55.9% specificity), and high (ISS ≥ 50; 51.4% sensitivity, 80.5% specificity). ISS demonstrated a solid ability to identify patients with hospitalized laboratory-confirmed influenza at increased risk for Major Clinical Events, potentially impacting clinical practice and research.


Participants
We used data from the 2011/2012 to 2018/2019 influenza seasons, selecting all patients with laboratory-confirmed influenza infection.The present analyses used data collected during the initial assessment of the patient within the hospital, which reflects the patient's condition immediately after being hospitalized.All hospitalized patients across the full range of illness severity were included in the present analyses, including those with and without supplemental oxygen and those requiring ventilatory support and ICU admission.There were no other data filters, and we kept all cases with missing values.
The study adhered to the guidelines outlined in the Declaration of Helsinki.The Research Ethics Boards approved the protocol, including data and sample collection and medical record review at all participating institutions (ClinicalTrials.govIdentifier: NCT1517191).

Definition of influenza infection
Nasopharyngeal swab samples from all participating subjects underwent reverse transcription polymerase chain reaction (RT-PCR) influenza testing 15 .Subjects were classified as "laboratory-confirmed cases" if they tested positive for influenza or "negative controls" if they tested negative.Only laboratory-confirmed influenza cases were included in the present analyses.

Data collection
Demographic and clinical data collection followed a standardized CIRN SOS Network protocol described elsewhere 13,16 .A broad set of variables from the SOS dataset were fed into model development.Demographic data included sex and age.Health-related data included: smoking status, clinical symptoms and signs (feverishness, nasal congestion, headache, abdominal pain, malaise, cough, diarrhea, weakness, shortness of breath, vomiting, dizziness, sore throat, nausea, muscle aches, arthralgia, prostration, seizures, myalgia, sneezing, conjunctivitis, sputum production, chest pain, encephalitis, nose bleed, altered consciousness, chills, and anorexia), function i.e. degree of dependence on activities of daily living (transferring, ambulating, need for assistive devices to ambulate, balance, bathing, toileting, handling medications, dressing, eating, handling finances), need for regular support for activities of daily living, need for additional support for activities of daily living, sensory disturbances (vision, hearing, and speech), bladder and bowel dysfunction, appetite disturbances, and comorbidities (ischemic heart disease, cardiac arrhythmias, valvular disease, congestive heart failure, hypertension, peripheral vascular disease, cerebrovascular disease, dementia, other noncognitive neurological disorders, hemiplegia/paraplegia, chronic pulmonary disease, pulmonary vascular disease, rheumatological disease, peptic ulcer disease, liver disease, diabetes mellitus, solid tumor, any type of metastatic cancer, HIV/AIDS, hypothyroidism, lymphoma, coagulopathy, blood loss anemia, deficiency anemia, alcohol abuse, drug abuse, obesity, involuntary weight loss, fluid and electrolyte disorders, edema, any psychiatric disease, depression, and peripheral skin ulcers).Influenza vaccination status was deemed "vaccinated" for those who received a current season flu vaccine more than 14 days before the onset of symptoms and "unvaccinated" if otherwise.Data collection was done by on-site study monitors who obtained the data for each patient based on the best possible source, including a review of patient charts or medical records and interviews with patients, family members, and healthcare team members where required.Influenza vaccination status was verified using medical records or registries where available or through contact with the immunizing health care professional.www.nature.com/scientificreports/

Outcomes of interest
The outcome of interest in this sample of patients was defined as the occurrence of a Major Clinical Event (MCE).We chose this outcome as it is a specific and measurable health event for which all study subjects were at risk at the time of hospitalization.MCE was defined as a composite outcome of the need for supplemental oxygen therapy, admission to an intermediate care unit, need for non-invasive or invasive ventilation, admission to an intensive care unit, or death.To assess the diagnostic accuracy of the scale in predicting MCE outcome, two steps were taken: (1) assessing the scale's overall diagnostic performance as a continuous variable and (2) grouping scores into three risk categories (low, moderate, and high) based on sensitivity and specificity values.The population's risk level must be considered when deciding on the test's minimum sensitivity and specificity levels.For low-risk individuals, a minimum specificity of 50% and maximum sensitivity should be chosen to ensure that those at low risk are identified without reducing the detectability of those at higher risk.For high-risk individuals, a minimum sensitivity of 50% and maximum specificity should be selected to ensure that those at higher risk are detected without reducing the screening of those at lower risk.The remaining scores were classified as moderate risk.

Development workflow
Supplementary material Fig. 1 describes the workflow scheme.

Data preprocessing
We used a data-splitting approach to validate our findings.As per the studies conducted by Dobbin and Simon 17 and Nguyen et al. 18 , a train-test splitting ratio of around 30% is considered reasonable.To provide an unbiased evaluation of the model fit on the training dataset while fine-tuning model hyperparameters, we also held out about 15% of the training set as a validation set.We randomly divided the total sample into three sets-a training set (60%), a validation set (10%), and a test set (30%), each stratified based on their MCE status to ensure an equal balance between groups.We then transformed the raw data into a valuable and efficient format: missing values were kept at an "Unknown" level, categorical variables were converted into dummy variables, and the continuous variable (age) was centralized and standardized.The variable imbalances between sets were evaluated using standardized mean differences and differences in proportion.A strict threshold of 0.05 was applied to indicate significant imbalances between the groups 19 .

Variable selection
We utilized the training and validation sets for variable selection.We modeled the outcome as a function of all predicting variables using Random Forest, which generated a list of importance rankings based on the Gini index.Next, we applied the Random Forest algorithm and tenfold cross-validation to calculate the Area Under the Receiver Operating Characteristic Curve (AU-ROC).We progressively included variables from the list, starting with the highest rank, until we reached a saturation threshold of AU-ROC variance ≤ 1% for two consecutive iterations.www.nature.com/scientificreports/Generating weighted scores Logistic Regression was applied to generate weighted scores, modeling MCE as a function of the chosen variables.Each variable's β coefficient was divided by the lowest β coefficient and rounded to the nearest value.The sum of scores for each category gave the total score normalized to a range of 0-100 for practicality, with 100 representing the highest risk for MCE and 0 denoting a zero risk.

Model evaluation
We evaluated the weighted scores' ability to predict MCE via four methods: Penalized Logistic Regression (PLR), Classification and Regression Trees (CART), Random Forest (RF), and eXtreme Gradient Boosting (XGBoost).
We employed a stratified ten-fold cross-validation for each predictive model to select the ideal hyperparameter combination using grid search, then trained each model individually.

Evaluation metrics
We developed Receiver Operating Characteristic (ROC), Precision-Recall (PR), and Gain curves for each prediction model to evaluate the performance of the four algorithms on the test dataset.We employed six traditional metrics (AU-ROC, Area Under the Precision-Recall Curve [AU-PRC], sensitivity, specificity, precision, and F1 score) to generate predicted classes.Then, we determined point-estimated metrics by cross-tabulating the observed and predicted classes.We selected the best model based on the best performance in the ROC and PR spaces 20 .

Comparing performance across different models
Table 2 reveals similar accuracy of predictive models on the training set, with minor disparities in AU-ROC.We chose not to exclude any models before assessing their performance on the test set.Figure 1 displays the ROC and PR curves of the prediction models on the test set, with individual ROC and gain curves in the Supplementary material (Figs.5-8).Results show that all models had acceptable discrimination performance, ranging from 69.2 to 73.1% in AU-ROC, similar to those in their training set.Examining the ROC and PR curves, the Penalized Logistic Regression and eXtreme Gradient Boosting models demonstrate the best overall performance when using the ISS as a continuous variable.

Finding the optimal cutoff points
When tested on the set, the ISS attained a remarkable AU-ROC of 0.73 (95% CI, 0.71-0.74).When applying the Youden index to determine the optimal score threshold, it revealed a value of ≥ 37, leading to a sensitivity of 0.70 (95% CI, 0.68-0.72)and a specificity of 0.63 (95% CI, 0.60-0.65).To interpret the scores simpler, Supplementary Table 2

Discussion
This study demonstrates the development and assessment of accuracy for the Influenza Severity Scale.This 10-item scale was designed to differentiate patients infected with influenza by their risk of experiencing major clinical events.The ISS is based on simple, easily accessible data and discriminates accurately.This emphasizes the www.nature.com/scientificreports/utility of the ISS for both retrospective and prospective studies in accurately assessing and predicting influenzarelated outcomes.Adams and colleagues reviewed common severity assessments for influenza and community-acquired pneumonia 7 .They examined 118 studies focusing on influenza, which included evaluations of tools such as PSI, CURB 65, APACHE II, SOFA, and qSOFA [21][22][23][24][25][26][27][28][29][30][31][32][33][34][35] .The clinical outcomes studied in these assessments involved mortality rates (overall, in ICU, in hospital), ICU admissions, mechanical ventilation needs, length of hospital stays, and total hospitalizations.Despite the extensive findings, none of the assessments were specifically designed for influenza or considered other critical outcomes like admission to intermediate care units or the need for oxygen therapy or non-invasive ventilation.Additionally, significant differences were observed in the clinical parameters and research endpoints across the studies reviewed.These variations made it challenging to combine results and www.nature.com/scientificreports/perform a meta-analysis accurately.Consequently, it hindered the authors from making precise evaluations of diagnostic performance and conducting direct comparisons between different assessment tools.Nevertheless, although PSI and CURB-65 are generally reliable in predicting 30-day mortality rates for community-acquired pneumonia in different clinical settings, some studies suggest that they may not be effective in predicting mortality rates for influenza pneumonia cases.For example, in a study conducted by Riquelme et al. 36 , these pneumonia scoring systems were ineffective in predicting the survival rate of low-risk patients with the H1N1 2009 influenza pandemic.Another study revealed that these scoring systems are still inefficient for the influenza pandemic because they cannot accurately predict the need for intensive care services 37 .The study revealed promising findings for SMRT-CO in identifying low-risk patients, with an AU-ROC of 0.826 37 .
In contrast, other studies found that the AU-ROC values for predicting mortality in patients with influenza A were 0.777 for CURB-65 and 0.560 for PSI 22 .This study proposed the FluA-p score as a novel approach to predict mortality in patients with influenza A-related pneumonia, achieving an AU-ROC of 0.908 22 .However, despite its high AU-ROC, the FluA-p score relies on laboratory variables as risk parameters, similar to the SOFA score.This reliance may present challenges when incorporating it into environments with limited resources or assessing severity in epidemiological studies that often lack access to laboratory data.
When analyzing medical research data with categorical outcomes, it's crucial to consider performance metrics.While the ROC curve is commonly used to assess test performance, dealing with imbalanced datasets can distort results.Combining ROC and Precision-Recall (PR) curves along with their respective AUC measurements (AU ROC and AU PR) is recommended to address this issue.Surprisingly, PR curves are often overlooked in diagnostic performance studies in influenza patients.To fill this gap, we assessed ISS's discriminative ability using ROC and PR curves.The ISS demonstrated strong discriminative performance in the test set, achieving AU ROC and AU PR values exceeding 70%.Remarkably, these results were achieved without using any laboratory parameters and by including patients from various sites to minimize bias stemming from local practices.
Estimating disease severity involves assessing the probability of significant clinical events among those individuals who are at risk but have not yet experienced any at the start of the observation period.This probability is determined by a set of parameters, some of which can be modified while others cannot.The severity assessment must consider modifiable and non-modifiable parameters, with the latter being the most important.Modifiable parameters may not always lead to a decrease in the risk of clinical events, but they can indicate how an intervention, disease progression, or host response will affect the baseline risk.Even if the modifiable parameters do not indicate a high risk, this does not necessarily mean the individual's baseline risk has changed.Instead, it reflects the natural history of the disease and how it interacts with the host.Therefore, individuals must be stratified based on modifiable and non-modifiable risks.Our tool considers this, allowing us to gain insight into the progression of the disease and the host's response to the infection.At the same time, it recognizes and respects the individual's particularities and dynamic responses.
To ensure the most effective risk management strategies, ISS utilizes a 3-threshold system to differentiate between low-and high-risk patients when measuring the likelihood of major clinical events.These thresholds were specifically chosen to ensure a high sensitivity and specificity rate exceeding 80%.This 3-tier system facilitates risk-based treatment protocols and permits a more centralized and cost-effective care distribution.For medical personnel, this allows for treatments and services to be tailored to the specific needs of patients, enabling higher-quality care and more successful patient outcomes.During epidemiological studies, the use of ISS enables the identification of the disease severity and its association with a particular strain.It can also be used as an adjustment measure when assessing the effectiveness of interventions.
Our study has several limitations that must be considered.Firstly, the data we used was obtained from hospital surveillance, which inherently limits our ability to account for specific institutional protocols.Our data consisted of the initial assessment of the patient within the hospital, which reflects the patient's condition immediately after being hospitalized.While using this approach, we might have lost the ability to capture the scale's performance in predicting the need for hospitalization.However, we could still track all significant clinical events that occurred afterward.Also, we have not had lower tract samples, meaning we may not have identified some people with severe disease, thus skewing the results.Moreover, our tool's reliance on non-modifiable parameters limits its capacity to be used as an evolutionary or responsive variable during hospitalization, as it does not concentrate www.nature.com/scientificreports/ on physiological parameters.Unsurprisingly, the absence of modifiable parameters, such as vital signs, laboratory, and radiological variables, impacted the discriminatory performance of our tool.Future studies should include these variables to enhance its performance while ensuring a balance with non-modifiable parameters.Some rare characteristics are likely underrepresented in our cohort, leading to an insufficient data sample to identify them as significant factors influencing ISS.Nonetheless, they could be relevant in other settings, also warranting further validation.Lastly, ISS comprises ten variables and individual scores, which can be challenging to remember in clinical practice.Ideally, future work could focus on developing automated means of collecting and calculating the ISS to support its use in research and clinical settings.Despite these drawbacks, our dataset was sizeable, multi-centric, and included a wide variety of people who are generally not included in these kinds of studies, such as individuals on the extremes of age with or without comorbidities.In summary, the ISS is a tool used to assess the severity of influenza infection.It considers the patient's symptoms and non-modifiable parameters to predict the likelihood of MCE without requiring lab tests other than confirming influenza infection.This can help direct protocols and policies to those at greater risk of MCE, such as older adults, women, smokers, CPD patients, diabetics, and those not vaccinated against influenza.It also highlights the importance of considering multiple factors (and their intersection) contributing to a person's risk rather than individual factors considered singly.
Additionally, the tool can raise awareness of important symptoms that predict worse outcomes, namely (in order of importance) shortness of breath, sputum production, and coughing, as well as important clinical features such as underlying health conditions and functional status.Notably, the most important clinical factors identified here were underlying chronic lung disease, shortness of breath, and baseline functional impairment with the requirement for support in activities of daily living.These are relevant and readily identifiable factors that impact clinical prognostication and decision-making.For example, a 70-year-old patient with baseline functional impairment who presents with shortness of breath is at high risk for MCE from influenza, and this can be communicated to the patient and family early in their admission.
We believe ISS will be essential for public health systems to monitor the effects of public health protocols on clinical outcomes and establish efficient surveillance measures.Further research is necessary to explore the utility of ISS in different clinical settings and its capacity to predict mortality.Ultimately, the ISS may prove to be a valuable metric for assessing and improving influenza-related health outcomes, contributing to the betterment of public health.

Figure 1 .
Figure 1.The (A) ROC and (B) PR curves of the models in predicting the occurrence of MCE on the test set.

Figure 2 .
Figure 2. The histogram of the ISS scores by risk categories overlaps a density plot by the outcome MCE.Sturges' Rule determined the optimal number of bins.

Table 2 .
Performance of the models in predicting the occurrence of MCE on the train and test sets.*AU-ROC, Area Under the Receiver Operating Characteristic curve.**AU-PRC, Area Under the Precision-Recall curve.